Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lookup for ratings in schema.org 'aggregateRating' entities listed on page #913

Merged
merged 2 commits into from
Oct 25, 2023
Merged

Add lookup for ratings in schema.org 'aggregateRating' entities listed on page #913

merged 2 commits into from
Oct 25, 2023

Conversation

jknndy
Copy link
Collaborator

@jknndy jknndy commented Oct 24, 2023

Resolves #908

Hey @jayaddison, took a shot at implementing something similar to your author change in #895 for ratings. Let me know how it looks, everything passed on my end.

New testcase data & correction to ingredients implemented in #905
@jknndy jknndy marked this pull request as ready for review October 24, 2023 23:35
Copy link
Collaborator

@jayaddison jayaddison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍 - and the test coverage is there to prove that it's working!

It could be interesting to figure out whether any other entity types in the schema.org test metadata involve @id-based lookups -- in other words, any other situations where we could more easily retrieve recipe info. But it doesn't seem to have caused many problems so far, so it's probably fairly rare.

Comment on lines +66 to +67
rating = self._find_entity(item, "AggregateRating")
if rating:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we could optionally reduce the filesize by a few lines by using the walrus operator here -- it looks like I missed one of out two opporunties to do that previously here. It should be safe to do that because this library requires Python 3.8 or greater.

@jayaddison jayaddison merged commit 7509c98 into hhursev:main Oct 25, 2023
16 checks passed
@jknndy
Copy link
Collaborator Author

jknndy commented Oct 25, 2023

Just looking at the test html from maang it looks like a lot (if not all the fields) are available in the graph element. I'm still planning to look into the schema backup check for all the html scraped fields like you mentioned last week. Wonder if there are sites with the graph element that don't utilize schema that could be simplified by implementing something like those for all fields.

@jayaddison
Copy link
Collaborator

Wonder if there are sites with the graph element that don't utilize schema that could be simplified by implementing something like those for all fields.

Oh, good point - I'm getting JSON-LD (the graph representation) and schema.org (the entity recipes) confused again, as usual. It's possible that we're missing other items too, yep. The extruct code does handle itemprop-based schema.org content, doesn't it?

@jknndy jknndy deleted the Ratings_Lookup branch October 26, 2023 20:36
strangetom pushed a commit to strangetom/recipe-scrapers that referenced this pull request Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

maangchi: aggregate rating data is not retrieved because it has moved into a referenced entity
2 participants